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This contribution describes a search for the associated production of Standard Model vector 
bosons WZ where the W boson decays leptonically (W — >■ iiy) and the Z boson decays to a heavy 
flavor quark pair {Z — >■ bb,cc). At least one identified ("tagged") heavy flavor jet in the final state 
is required. Given the small di-jet invariant mass separation between the W and the Z resonances, 
the production of WW events where one W decays leptonically and the second W decays into an 
heavy flavor jet (e.i. W^ — >■ cs) contributes to our the signal. 

This search uses data collected with the CDF II detector at the Tevatron Collider at Fermilab, 
and corresponding to an integrated luminosity of approximately 7.5 fb"'^. Events consistent with 
the signature of a charged lepton (electron or muon), large missing transverse energy and exactly 
two jets, of which at least one is required to contain a secondary vertex displaced from the jet origin, 
are selected. A multivariate discriminant based on the Support Vector Machine algorithm is used 
to reduce greatly the multi-jet background contamination. 

We observe a signal of signiflcance of 3.03o" over the background only hypothesis. A cross section 
of 1.085 1 40 times the expected Standard Model value for the combined WZ/WW production and 
decay into heavy flavors is measured, consistent with the Standard Model prediction. 

This contribution describes the search for pp — >■ WZ — )■ ii/bb (or — ii/cc) . The signature for this process is a 
WK-boson, decaying to a high-Py charged lepton and neutrino, plus a Z-boson decaying to two jets containing 
heavy flavor quarks. This signature is very similar to the one used in the search for a low mass Higgs boson 
{Mh < 140 GeV/c^), where the H ^ bb branching fraction is large and the particle is produced in association 
with a W boson. Therefore the identification of the WZ signal in the channel containing heavy flavor jets 
represents a benchmark in the search for the low mass Higgs. 

We base our signal to background discrimination on the invariant mass distribution of high-P^^ jet pair 
entering in our selection, therefore, since we use a secondary vertex finding algorithm to identify 6-quark 
produced jets (6 — tagging), we also consider the process WW Ivcs as part of our signal. We identify about 
8% of the secondary vertices produced by charmed-hadrons coming from WW events. On the other hand, we 
identify more than 60% of W Z decaying into heavy flavors. 

The main backgrounds for the signal processes include: VF+jets production (where the jets contain either 
tagged heavy flavor or mis-tagged light flavor), top quark production and multi-jet production, where one 
jet is misidentified as a lepton. As we increase the acceptance of our signal using several triggers and lepton 
identification algorithms, we use a multi-jet rejection algorithm based on a Support Vector Machine discriminant 
exploiting the kinematic of the event. 

The CDF detector is described in detail in [T]. 

I. DATA SAMPLE & EVENT SELECTION 

This analysis is based on an integrated luminosity of 7.5 fb^^ collected with the CDFII detector between 
March 2002 and March 2011. We select events consistent with the signature of a PF boson leptonic decay, large 
missing transverse energy and exactly two energetic 6— quark jets. We accept tight charged lepton candidates, 
loose charged lepton candidates and isolated tracks; by construction these lepton categories are orthogonal to 
each other. The data containing tight leptons are collected with an inclusive lepton trigger that requires an 
electron (muon) with transverse energy, Ex, greater than 18 GeV (transverse momentum, P^, grater than 
18 GeV/c). The data containing loose leptons and isolated tracks are collected using triggers based on missing 
transverse energy {f^t ) and jet information. 

In the following we refer to these categories of events: 

• CEM: central tight electrons; 

• CMUP and CMX: central tight muons; 

• EMC (extended muon categories): loose muons and isolated track lepton candidates. 



2 



Proceedings of the DPF-2011 Conference, Providence, RI, August 8-13, 2011 



We select events consistent with a VF-boson decay plus two energetic b-quark jets. The M^-boson events are 
selected by requiring a single, isolated electron (muon) with Et{Pt)> 20 GeV(Gev/c) central in the detector 
(absolute pseudorapidity in detector coordinates system, |r/£)et|, less then 1.1) and > 20 GeV (> 10 GeV for 
CMUP and CMX). Exactly two central Q-noetl < 2.0) jets with E^"'''' > 20 GeV (energy corrected for detector 
effects) are required. In order to improve the separation of signal and background events, we require that at 
least one of the two jets is identified to originate from a heavy quark by the Secondary Vertex tagger SecVtx [2]. 
We further suppress non-W muti-jet background using a multivariate techinque (see Section I A) 



A. Suppression of non-W Multi-jet Background 

A fake W^-boson-like signature can be generated when one jet fakes a high p^ lepton and comes from 
jet energy mis-measurement. We developed a method to suppress this, so-called, multi-jet background using 
a multivariate technique based on the Support Vector Machine algorithm (SVM) fS*. We developed a soft- 
ware package, based on the LibSVM ^ library, able to perform algorithm training, variable ranking, signal 
discrimination and robustness test. 

Although we trained the discriminant on a central electron sample, we apply it to all our selected data-sets 
(electron, tight muons, loose muons and tracks) because the algorithm is based only on kinematic variables. We 
achieve a large reduction of the multi-jet contamination in all the lepton categories, maintaining a very high 
efficiency on the signature pp £i> + jj. 



1. Summary and Results of the SVM Training 



The test and training sample used to develop the multi-jet veto was built with the following W selection 
requirements: 

• one high energy, isolated central electron; 

• exactly two jets reconstructed with \riDet \ < 2.0 and E^^^ > 20 GeV; 

• presence of missing transverse energy as signature of the escaping neutrino. 

Multi-jet events can pass the same requirements, if one of the jets fakes the electron and the fit is either mis- 
measured by the detector, faked by mis-identified (or undetected) minimum ionizing particles or produced by 
neutrinos associated with decay of heavy quarks. We built our training-set using 8000 signal events and 4000 
background events (to emulate the data composition): 

signal: W + 2 partons Alpgen Monte-Carlo ^5 , where the W is forced to decay into electron and neutrino. 
We have « 10^ generated events and we keep sa 9 x 10^ events as a control sample (i.e. not used for training). 

background: due to the nature of the background (a mixture of physics processes and detector response) , there 
is no simulated models that can be trusted to provide the accurate description needed for training. Therefore 
we use a data-driven approach to obtain a suitable sample: we select events with a fake electron by reversing 
some of the "electron quality" requirements (at least 2 out of the 5 cuts), used to identify the shape of the 
electromagnetic shower in the calorimeter. This selection is named "anti-electron" and produces a multi-jet 
enriched sample which is, however, statistically limited to a few thousands of events. Furthermore, we cannot 
rely on the modeling of the variables directly correlated to the reversed electron cuts. 

The variable-sorting algorithm produced an optimal SVM using 6 variables as input features: 

• W related variables: Lepton Pt, fit , A(?!>(e, _^t™'"); 

• Jet related variables: E'™™ and E^^^ of second most energetic jet; 

• Global variables: fit significance (a variable that relates fit with jet corrections). 

The best SVM configuration reduce the fraction of background in data to < 10%, and it has an efficiency 

on signal events of ef^^ « 95% (from MC). 
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II. BACKGROUNDS 



Since our final state has the signature of a charged lepton, and two jets (a W boson and jets signature), 
the following background sources are considered: 

Non-W/Multi-jet : a W-boson-like signature is generated when one jet fakes a high Pt lepton and is 



generated through jet energy mis-measurement (mode details can be found in Section I A). 

W + Mistags: this background occurs when one or more light flavor jets produced in association with a W 
boson are mistakenly identified as a heavy flavor jet by the b-tagging algorithm. Mistags are generated because 
of the flnite resolution of the tracking detectors, material interactions, or from long-lived light flavor hadrons 
(A and Ks) which produce real displaced vertices. 

W+ Heavy Flavor: these processes {W + bb, W + cc and W + c) involve the production of heavy flavor 
quarks in association with a W boson. 

Other Electroweak Backgrounds: additional small but non-negligible background contributions come 
from single top quark and top quark pair production, Z boson + jets production. 

We determine the amount of selected W-fjets events for each lepton category by fitting the fit distribution 
of the pretag data control sample: for Top and Electroweak components the MC templates are normalized to 
the theoretical expectation while for W-|-jets and Non-W the normalization is free to float in the likelihood fit 
used. The following samples are used to produce the non-W templates: 

modified anti-electrons for central electron fakes : 



non-isolated (iso > 0.2) tight muons for the central tight muons fakes; 
non-isolated (iso > 0.2) loose muons to mimic the EMC categories. 



As expected after the efficient multivariate multi-jet rejection cut, the fits return a very small multi-jet contam- 
ination (ranging from 2.2% to 7.5% depending on the lepton category), . 

The &— tagged M^-I-Heavy Flavor (HF) component is extracted from the total Ty-|-jets pretag sample: the 
total W^-|-jets is composed by a large set of Alpgen+Pythia T Monte Carlo weighted by their LO production 
cross section, the HF fractions are then extracted and scaled for the NLO contribution and 6— tagging algorithm 
efficiency. 

We estimate the normalization of W -I- Mistags background by applying the mistag matrix to the pretag data 
after subtracting the non-W, top, diboson, Z-|-jets and W-I-HF contributions. We model the W + Mistag kine- 
matics and shapes using W -I- Light Flavor Monte Carlo events weighting each event for the mistag probability. 

The top quark and other electroweak backgrounds are normalized directly to their theoretical cross sections, 
calculated at next-to-leading order. 

Finally the residual tagged Non-W component is fitted to the data together with a template of all the other 
backgrounds: the two normalizations are free to float and the multi-jet one is extracted. 

More details on the background estimate can be found in Ref 6J. 



Tables m O and III summarize the number of observed and expected events in the W-l-2 jets sample, for all 



lepton categories, before requiring a b-tag, with one b-tag and with two b-tags, respectively. 



III. Mi„4jetljet2) DISTRIBUTION 

The signal discrimination is based on the invariant mass of the two jets (Miny{jetljet2)) in the event. 
Candidates are separated into four statistically independent channels: tight leptons (CEM-I-CMUP-I-CMX) 
with 1 SecVtx tag, EMC lepton candidates with 1 SecVtx tag, tight leptons with 2 SecVtx tags and EMC with 
2 SecVtx tags. The Mi„„(jetljei2) distributions for single tagged events for tight leptons and EMC leptons are 
shown in Figure [T] and Figure [2] respectively. The corresponding distributions for events with two tags are shown 
in Figure [3] and Figure [4j These four distributions are used for the final signal to background discrimination. 
The Mi„y{jetljet2) plots shown here are the ones returned by the final fit, with a full treatment of the correlated 
systematic effects (see next paragraph for a complete description). 



IV. STATISTICAL ANALYSIS 



Since the process WZ/WW —> + Heavy Flavors was never observed before, we start evaluating 95% C.L. 
limits on a potential signal. The following systematic uncertainties (for background and signal) are taken into 
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TABLE L Summary of pretagged observed and expected events in the W+2 jets sample in 7.5 fb~^ of data. 



Channel 


CEM 


CMUP 


CMX 


EMC 


All Channels 


Pretag Data 


61596 


29036 


18878 


27946 


137456 


tt 


498 ± 46 


271 ± 25 


133 ± 17 


418 ± 39 


1320 ± 95 


Single Top a 


123 ± 11 


66 ± 6 


32 ± 4 


87 ± 8 


308 ± 22 


Single Top t 


191 ± 21 


101 ± 11 


52 ± 7 


130 ± 14 


474 ± 37 


WW 


1580 ± 132 


804 ± 67 


465 ± 56 


822 ± 70 


3672 ± 253 


WZ 


216 ± 20 


118 ± 11 


75 ± 10 


147 ± 14 


556 ± 40 


ZZ 


4.1 ± 0.3 


6.1 ± 0.5 


3.7 ± 0.4 


8.1 ± 0.7 


22 ± 2 


Z+jets 


1185 ± 147 


1690 ± 209 


1084 ± 163 


1881 ± 234 


5840 ± 728 


W + hb 


1892 ± 759 


905 ± 362 


540 ± 216 


843 ± 339 


4180 ± 932 


W + cc 


4041 ± 1622 


1873 ± 750 


1195 ± 479 


1724 ± 693 


8833 ± 1975 


W + cj 


3174 ± 1274 


1543 ± 618 


935 ± 375 


1117 ± 449 


6770 ± 1532 


W +Light Flavor 


44509 ± 2785 


20987 ± 1104 


13661 ± 741 


18645 ± 1274 


97803 ± 3339 


Non-W 


4182 ± 1673 


672 ± 269 


702 ± 281 


2122 ± 849 


7678 ± 1916 



TABLE IL Summary of observed and expected events with one secondary vertex tag (SecVtx), in the W+2 jets sample, 
in 7.5 fb~^ of data. 



Chennel 


CEM 


CMUP 


CMX 


EMC 


All Channels 


Pretag Data 


61596 


29036 


18878 


27946 


137456 


tt 


201.3 ± 19.6 


109.8 ± 10.7 


55.0 + 7.1 


171.9 + 16.9 


538 + 53 


Single Top s 


52.9 ± 4.8 


28.2 ± 2.6 


14.0 + 1.8 


38.0 + 3.5 


133 + 12 


Single Top t 


71.4 ± 8.4 


37.4 ± 4.4 


19.8 + 2.9 


49.5 + 5.8 


178 + 21 


WW 


68.0 ± 9.4 


33.3 ± 4.6 


20.3 + 3.3 


38.4 + 5.3 


160 + 22 


WZ 


21.8 ± 2.3 


11.5 ± 1.25 


7.4 + 1.0 


14.1 + 1.6 


54.7 + 5.9 


ZZ 


0.44 ± 0.04 


0.65 ± 0.06 


0.42 + 0.05 


0.86 + 0.08 


2.4 + 0.2 


Z+jets 


27.9 ± 3.5 


43.0 ± 5.5 


27.3 + 4.2 


65.0 + 8.4 


163 + 21.1 


W + 66 


632.9 ± 254.2 


309.8+ 124.1 


192.3 + 77.1 


308.9 + 124.3 


1444 + 579 


W + cc 


331.0 ± 133.7 


155.1 ± 62.5 


96.2 + 38.8 


164.2 ± 66.4 


747 + 301 


W + cj 


259.9 ± 105.0 


127.8 ± 51.5 


75.3 + 30.4 


106.4 ± 43.0 


569 + 229 


Mistag 


605.2 ± 71.3 


283.8 + 31.7 


181.0 + 20.6 


346.2 + 39.2 


1416 + 146 


Non-W 


173.9 ± 69.6 


45.8 + 18.3 


2.8 ± 1.1 


100.9 + 40.4 


323.3 + 129 


Prediction 


2446.6 ± 503.7 


1186.2 + 242.2 


691.8 ± 148.7 


1404.5 + 242.6 


5729 + 1132 


Observed 


2332 


1137 


699 


1318 


5486 


ww/wz 


89.7 ± 10.2 


44.8 ± 5.05 


27.7 ± 3.9 


52.5 + 5.9 


214.8 ± 24.4 



account as normalization nuisance parameters: JES, Alpgen , b-tag scale factor, lepton identification and 
trigger efficiencies, multi-jet background normalization, NLO scaling of W+heavy flavor production, ISR/FSR 
(for signal only) and mistag uncertainty. In addition, JES and are taken as shape systematics as well, where 
the interpolated shape variation is used as nuissance parameter. All the nuissance parameters are fitted to 
improve the sensitivity. 

Expected 95% C.L. Hmits, assuming no SM WW/WZ production, are determined using Monte Carlo pseudo 
experiments based on expected yields varied within the assigned systematics. The normalization and shape 



uncertainties are integrated into the limit calculations. Table IV summarizes the median expected (and observed) 



95% production limits (in units of SM WW + WZ cross section multiplied by branching fraction into heavy 
flavors). 

Combining single-tagged and double-tagged results for all lepton categories, we find an expected limit of: 
0.575 to,ll times the SM prediction. 

The observed limit, combining single-tagged and double-tagged results for all lepton categories, is 1.46 times 
the SM prediction. 
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TABLE III: Summary of observed and expected events with two secondary vertex tags (SecVtx), in the W+2 jets sample, 
in 7.5 fb"^ of data. 



Chennel 


GEM 


CMUP 


CMX 


EMC 


All Channels 


PrptPiP" Dat.a 


61596 


29036 


18878 


27946 


137456 


tt 


42.2 ± 6.1 




11.1 ± 1.9 


34.4 ± 5.0 


1 riQ 7 + 1 'S S 

J. U (7 • 1 1 J- iJ • O 


Single Top s 


14.1 ± 2.0 


7.6 ± 1.1 


3.7 ± 0.6 


10.2 ± 1.4 


35.6 ± 5.0 


Single Top t 


4.2 ± 0.7 


2.3 ± 0.4 


1.2 ± 0.2 


3.1 ± 0.5 


10.8 ± 1.7 


WW 


0.6 ± 0.1 


0.26 ± 0.07 


0.16 ± 0.04 


0.33 ± 0.08 


1.3 ± 0.3 


WZ 


4.0 ± 0.6 


1.9 ± 0.3 


1.4 ± 0.2 


2.4 ± 0.4 


9.6 ± 1.4 


ZZ 


0.06 ± 0.01 


0.12 ± 0.02 


0.09 ± 0.01 


0.16 ± 0.02 


0.43 ± 0.06 


Z+jets 


0.9 ± 0.1 


2.0 ± 0.3 


1.2 ± 0.2 


3.1 ± 0.4 


7.2 ± 1.0 


W + 6fo 


81.9 ± 33.2 


42.2 ± 17.1 


23.4 ± 9.5 


44.9 ± 18.2 


192 ± 78 


W + cc 


4.7 ± 1.9 


2.3 ± 1.0 


1.3 ± 0.5 


2.8 ± 1.1 


11.0 ± 4.5 


W + cj 


3.7 ± 1.5 


1.9 ± 0.8 


1.0 ± 0.4 


1.8 ± 0.7 


8.3 ± 3.4 


Mistag 


3.2 ± 0.7 


1.6 ± 0.3 


0.9 ± 0.2 


2.2 ± 0.4 


7.8 ± 1.6 


Non-W 


7.9 ± 3.2 


4.8 ± 1.9 


0.1 ± 0.5 


0.0 ± 0.5 


12.8 ± 6.1 


Prediction 


167.3 ± 38.0 


88.9 ± 19.6 


45.4 ± 10.9 


105.3 ± 21.5 


406.9 ± 89.5 


Observed 


147 


74 


39 


106 


366 


WW/WZ 


4.6 ± 0.6 


2.1 ± 0.3 


1.5 ± 0.2 


2.7 ± 0.4 


10.9 ± 1.5 



CDF R jn II Preliminary ( 7.5 fb'' ) 




FIG. 1: Mi„v{jetl, jet2) distribution for the 1 SecVtx tag candidates, tight leptons (CEM+CMUP+CMX combined). 
The best fit of the systematic nuisance parameters are taken into account. 



A. Sensitivity 

To compute the significance of a potentially observed signal, we perform a hypothesis test, comparing the data 
to two hypotheses. The null hypothesis, Hq, assumes Standard Model processes except WW + WZ production. 
The second hypothesis. Hi, assumes that the WW + WZ production cross section and the branching ratio into 
heavy flavors are the ones predicted by the Standard Model. The likelihood ratio is defined as: 

p{data\HQ,§) 

where 9 represents the nuisance parameters describing the uncertain values of the quantities studied for sys- 
tematic error, 9 the best flt values of 9 under Hi and 9 are the best fit values of the nuisance parameters under 
Hq. We perform two sets of pseudo-experiments to determine the expected sensitivity to the signal [8], one 
assuming Hq and a second one assuming Hi. On each pseudoexperiment, the values of the nuisance parameters 
are chosen randomly based on the systematic errors. The distributions of the values of —2lnQ are shown in 



(1) 
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CDF Rjn II Preliminary ( 7.5 fb'') 




PEla -Told PreilcllBn 



FIG. 2: Minv{jetl, jet2) distribution for the 1 SecVtx tag candidates, extended muon categories (EMC). Ttie best fit of 
the systematic nuisance parameters are taken into account. 




FIG. 3: Minv{jetl,jet2) distribution for the 2 SecVtx tag candidates, tight leptons (CEM+CMUP+CMX combined). 
The best fit of the systematic nuisance parameters are taken into account. 



Fig. [5] for the two hypotheses and the data. 

The p— value is the probabihty that —2lnQ < —2lnQQ, assuming the null hypothesis Hq. The p— value was 
found to be 0.00120, corresponding to a 3.03cr excess. 

The sensitivity of the analysis is computed as the median expected p— value assuming a signal is truly present. 
The median —2lnQ is extracted from the Hi distribution, and the integral of the Hq distribution of —2lnQ to 
the left of this median value is the median expected p-value. The value obtained is 0.00126, corresponding to 
3.02CT. 



TABLE IV: Expected limits for each lepton category, single and double tagged events, in units of the SM cross section 
for WW + WZ production multiplied by the branching fraction into heavy flavors. 



Tag and lepton category 


Expected limit 


Observed limit 


1 tag Tight Leptons (CEM+CMUP+CMX) 

1 tag EMC 

2 tags Tight Leptons (CEM+CMUP+CMX) 
2 tags EMC 


-0.38 
1 91 +0.57 
^•^-■^ -0.46 

4 02 +^-^* 

^•"^ -1.50 
D.Ur _2.20 


2.07 
1.70 
3.03 
8.59 


All combined 


n c,7 +0.33 

-0.31 


1.46 
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FIG. 4: Minv{jetl, jet2) distribution for the 2 SecVtx tag candidates, extended muon categories (EMC). Ttie best fit of 
the systematic nuisance parameters are taken into account. 



CDF Run II Preliminary ( 7.5 fb ' ) 




50 -AO 30 20 10 10 20 3 40 SO 



Test Statistic: -2in(a) 

FIG. 5: Distributions of —2lnQ for the test hypothesis Hi, which assumes Standard Model backgrounds plus Standard- 
Model WW + WZ production and decay into heavy flavors (blue histogram), and for the null hypothesis. Ho, which 
assumes no WW + WZ (red histogram). The observed value of —2lnQ ts indicated with a solid, vertical line. The plot 
is shown on a logarithmic scale. The p-value is the fraction of the integral of the Ho curve to the left of the data. 



B. WW + WZ cross section measurement 



In order to measure the WW + WZ production cross section, a Bayesian marginalization technique is apphed 
to the Minvijetl, jet2) distribution in both 1 tag and 2 tags samples. The nuisance parameters are integrated 
out as described in [H]. The distribution of the posterior is shown in Fig. |6] The maximum of the posterior 
is taken to be the best fit value for the cross section measurement, and the 1-a confidence interval is taken to 
be the shortest interval containing 68% of the integral of the posterior distribution. The resulting cross section 
measurement, in units of expected SM signal, is: I.OSS^q ^q. 



V. CONCLUSION 



We analyzed 7.5 fb^^ of data looking for the WW/WZ lv-\- HF signal in the W + 2 jets exclusive sample. 
We found an excess over background in the Minv{jetl, jet2) distribution looking at the Double tagged + Single 
tagged samples. Using a background only-hypothesis we found our result inconsistent with data: the significance 
of the observed signal corresponds to 3.03cr. We performed a measurement of the cross section for this process, 
that we found to be 1.085 f^'l^ times the expected Standard Model prediction. 
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CDF Run II Preliminary ( 7.5 fb'^ ) 




FIG. 6: The Bayesian posterior, marginalized over nuisance parameters, is shown. The maximum value is the central 
value of the cross-section. The blue area represents the smallest interval enclosing 68% of the integral of the posterior. 
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